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Abstract 

Rapidly evolving non-coding plastid regions (NCPs) are currently widely used in evolutionary biology especially in plant 
systematic studies where NCPs have become one of the most commonly used tools in clarifying species relationships. 
Currently, the generally small amount of sequence variation provided by NCPs compared to nuclear regions makes plastid 
phylogeny reconstruction challenging at the species-level, especially so in species rich clades such as Solanum with c. 1,200 
species. Previous studies have established that the set of most highly variable NCPs vary between major plant families, and 
here we explore whether this variation extends beyond family level to genera and major clades within genera. Using full 
plastome data, we identify the most highly variable plastid markers in the Potato clade of Solanum. We then compare 
sequence variation between the Potato and the closely related Morelloid clade. Results show that whilst a narrow set of 
NCPs show consistently high variation, levels of sequence variation in most NCPs differ greatly between the two closely 
related clades. The high variation detected between closely related groups implies that repeated screening studies will be 
needed for individual projects despite the potential availability of results from closely related taxa, and indicates a narrower 
applicability of family-specific screening studies than previously thought. 
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Background 

Rapidly evolving non-coding plastid regions (NCPs) are 
currently used in a wide range of evolutionary studies, including 
origin of domesticated species [1-4], diversification patterns in 
biodiversity hotspots [5-6] , effect of climate change on biodiversity 
[7-8], and molecular barcoding [9-10]. In plant systematics, 
NCPs have become one of the most commonly used tools in 
elucidating species relationships, especially so in groups with 
complex evolutionary histories involving hybridisation, polyploidy 
and/ or introgression, where plastid gene trees are used as baseline 
data for resolving true species trees due to their uniparental 
inheritance [11-13]. 

Despite past efforts to identify the most rapidly evolving plastid 
markers across land plants [10,14—16], the generally small amount 
of sequence variation provided even in the most fast evolving 
NCPs as compared to nuclear regions is still limiting their use in 
phylogenetic studies. This generally low variation found in NCPs 
compared to more rapidly evolving nuclear regions means that 
more sequencing is needed to achieve equivalent resolution, 
making the use of plastid markers costly and time consuming 
especially so in large species-rich groups [17—18]. In fact, in many 
large groups, lack of plastid sequence variation has led investiga- 
tors to re-direct their efforts towards the nuclear genome to look 
for more highly variable, single copy nuclear regions which could 
be used as a more cost-efiicient way to derive robust phylogenies 
[18-23]. 



The advances in next generation sequencing technologies 
(NGS) during the past five years have brought solutions for 
buUding species-level plastid phylogenies. Whole plastid genomes 
can now be generated for multiple species with much reduced cost 
through massive parallel sequencing [24—26]. Most systematisists 
have yet to capitalise on NGS approaches largely due to a lack of 
technical expertise required to assemble and analyse NGS data. 
Studies are, however, now appearing, and are showing the 
usefulness of full plastome data in resolving broader level questions 
at family or order level [25,27-29], in exploring sequence 
variation within species [30-31], and in elucidating relationships 
between closely related species [32] . 

Although fuU plastid genome sequencing has become an 
increasingly realistic option for molecular phylogenetic studies of 
relatively small genera with up to 150 species (e.g.. Pirns [32]), such 
solutions are yet not practical for larger genera such as Solanum 
which includes c. 1,200 species. For such species-rich clades, cost 
of sequencing full plastid genomes is stiU large, and there are clear 
advantages of sequencing more species with fewer base pairs 
compared to fewer species with full genomes in molecular 
systematic projects. A trade-off approach has been adopted where 
only a few fuU plastid genomes are generated, and then used for 
developing highly variable plastid markers that can be sequenced 
with traditional Sanger sequencing. Such a screening approach 
has thus far been used in Solanaceae, Asteraceae, and Poaceae to 
identify a set of most variable plastid markers [29,33—35]. 
Interestingly, results from these studies have revealed that there 
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is considerable variation and surprisingly little (c. 12-25%) overlap 
in the most variable markers between families [29,33-35]. These 
findings indicate that there may be only a few universally variable 
NCPs across land plants, and that more individual, family-specific 
screening studies wiU be required to identily the most highly 
variable markers for individual clades. 

In this paper we explore whether variation in the most highly 
variable plastid markers extends beyond family level to genera and 
major clades within genera. We specifically ask: (1) Do genera, and 
clades within genera, share the same h^per-variable plastid 
regions, and (2) Can full genome sequences of closely related 
species be used to find and develop new hyper-variable plastid 
markers for related taxa? We use the mega-diverse genus Solanum 
as our case study, a genus in which traditionally used plastid 
markers have proven to provide little variation. We identify the 
most rapidly evolving NCPs based on the three available full 
plastid genomes, all from the Potato clade of Solanum (S. tuberosum 
L., S. bulbocastanum Dunal, and S. lycopersicum L). We then compare 
sequence variation in the most variable regions within the Potato 
clade with sequence variation in the same set of markers in the 
relatively closely related Morelloid clade of Solanum. In addition, 
we screen seven NCPs which have been previously used in 
molecular systematic studies in Solanaceae. We discuss variation 
across the newly identified and previously used NCPs in the 
context of plastid marker selection for phylogeny reconstruction. 

Results 

Full Plastome Screening 

The thirteen most variable plastid markers identified from the 
fuU plastome sequences of three Solanum species included atpB- 
rbcL, clpP-psbB, ndhF, ndhF-rpl32, petL-psaJ (petL-petG-tmW- 
trnP-psaJ), petN-psbM, rpl32-trnL, rpoCl-rpoB, trnA-tml, trnK- 
rpsl6, and ycfl (parts 1-3) (Table 1, Figure 1). Three of these 
regions included considerable proportions of coding regions: ndhF 
(92% coding), petL-psaJ (27'yii coding), and ycfl (parts 1-3) (all 
coding). Most substitutions obser\-ed within these coding regions 
were synonymous, and Ka/Ks ratios indicated neutral or purifying 
selection. 

Of the thirteen markers, four [ndhF-Tpl32, rpl32-trnL, and tmK- 
rpsl6, ndhC-trnV) were amongst the widely used plastid markers 
published by Shaw et al. [16], of which ta^-rpsl6 was identified as 
the most variable plastid marker in Solanum in a previous study that 
focused on exploring plastid variation within Solanaceae but not 
within Solanum in particular [33]. One of the markers identified, 
ndhF, has been widely used in molecular phylogenetic studies in 
Solanaceae [57]. The remaining regions have not been previously 
used in Solanaceae. 

Traditional Screening 

The most variable markers identified from the full plastid 
genomes of the Potato clade of Solanum were screened in the closely 
related Morelloid clade together with a set of already published, 
commonly used plastid markers in species-level phylogenetic 
studies (Table 1). There was no notable length variation in any 
of the regions between the two clades (Table 1). Three regions, 
namely parts 1 and 3, and rpl32-tmL, performed consistendy 
well and were ranked within the top six most variable regions for 
both clades (Table 1). The results were the same whether 
considering the pure number of variable sites, or variable sites 
including indels (Table 1). Variation was not always correlated 
with the length of the region: Results agree for most of the top six 
regions whether considering the actual number or percentage of 
variable sites, with the exceptions oiycfl part 3 ?inA petL-psaJ -which 



both show high number of actual variable sites but lower 
percentage values compared to tmK-rpsl6 and ndhF-rpl32 
(Table 1). Because our aim was to find markers that would 

provide the maximum number of variable characters for 
phylogeny estimation with minimum sequencing cost, we focus 
our discussion on the top ranking regions based on the pure 
number of variable sites rather than the percentage. 

The top three most variable plastid regions identified in the 
Potato clade did not correspond to the three most variable regions 
detected in the Morelloid clade, independent of how variation was 
measured (Table 1). The three most variable regions in the Potato 
clade included clpP-psbB, rpl32-trnL, and ycfl part 1, whilst ndhC- 
tmV, and ycfl parts 1 and 2 were the most variable regions in the 
Morelloid clade (Table 1). The most variable region in the Potato 
clade, clpP-p.shB, ranked 13* in the Morelloid clade (Table 1). 
Similarly, the most variable region in the Morelloid clade, ndhC- 
tmV, ranked 14"' in the Potato clade (Table 1). 

Variation within the Inverted Repeat 

Some of the most highly variable markers detected in the Potato 
clade were found within the Inverted Repeat region (IR) of the 
plastid genome, including tmA-tml and ycfl part 3. We included 
the counterpart of tmA-tml from the IR copy B in our screening 
{tml-tmA) in order to assure that the two repeat segments are 
identical, which is expected based on the fact that the two IR 
copies evolve in concert [36]. Both copies were found identical in 
reverse complement in the two clades, and hence only tmA-tml 
results were scored (Table 1). We did not sequence both copies of 
\htycfl part 3 but assumed that these would be identical due to the 
concerted evolution of IR. 

Plastid versus Nuclear Sequence Variation 

The average number of variable sites, including indels, in the 
most variable plastid markers identified in the Potato clade was 25 
(2.5%), and 40 (4.2%) for the MoreUoid clade (Table 1). The best 
performing plastid marker provided 48 (4.5%) variable sites 
including indels in the Potato clade, and 80 (9.0%) in the 
Morelloid clade (Table 1). The plastid marker variability (Potato 
4.5%, Morelloid 9.0%) was still lower in both Solanum clades when 
compared to variation found in the most commonly used nuclear 
markers ITS and waxy (Potato 12.2% and 8.7%, respectively; 
MoreUoid 13.5% and 11.5%, respectively) (Table 1). The 
difference between the plastid and nuclear markers was only 
slight, however, when comparing the actual numbers of variable 
characters, including indels: the most variable plastid marker 
provided 80 variable sites, including indels, in the MoreUoid clade, 
whUst ITS and waxy provided 83 and 82 sites, respectively 
(Table 1). 

Discussion 

A study by Daniell et al. [3,3] compared four plastid genomes 
across Solanaceae and identified 21 most variable intergenic 
regions within the family. Their study included two Solanum plastid 
genomes, S. bulbocastanum and S. lycopersicum, but did not specificaUy 
aim to identify the most variable plastid regions for Solanum. Since 
their study, the potato plastid genome has become available, and 
by comparing the thrive plastomes, we aimed to identify the most 
variable plastid regions for Solanum that could be used for buUding 
robust plastid phylogenies in species-level studies. 

Variation across Clades 

Whether measured by variable sites, or by the combination of 
variable sites and indels, our results show that the set of most 
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Figure 1. Sequence variation across tlie Solanum full plastome sequences. Sequence variation across the three full plastid genomes of 
Solanum (S. tuberosum, S. bulbocastanum, and S. lycopersicum). The graph shows sequence similarity (%, shown in pale red) in relation to the 
reference sequence of S. tuberosum which was used to annotate the alignment. Grey arrows above the alignment indicate genes and their 
orientation. Thick black lines show the position of the Inverted Repeat (IR) regions. The most variable regions detected based on a sliding window 
analysis are highlighted in grey. BULBO = S. bulbocastanum, TOIVIATO (S. lycopersicum). 
doi:1 0.1 371/journal.pone.0082266.g001 



variable plastid regions vary between the two closely related clades 
of Solanum. In fact, most markers with high sequence variation in 
one clade showed more modest levels of variation in the related 
clade. Our results are in agreement with previous studies that have 
aimed to identify plastid regions that show consistently high 
sequence variation across a wide range of plant groups [15—16,33— 
34] . These studies have acknowledged that there is no single set of 
NCPs that can be universally apphed in species-level systematic 
studies across plant taxa [15-16]. In fact, comparison between the 
various family-specific screening studies showed that there are 
large dilferences between the most variable plastid regions in terms 
of sequence variation between major plant lineages and families 
[33-34], but the fact that large variation exists even between 
closely related clades within genera has not been discussed. 

Although we present only a small case study, our results have 
implications for all studies aiming to identify a set of most variable 
NCPs based on only a few full plastid genomes. The differences 
found in the most variable plastid markers even between closely 
related groups, such as found here in the two Solanum clades, 
indicate that there is litde universality in sequence variation in the 
plastid genome even at this close taxonomic level. The results 
imply that screening studies will be needed for individual projects 
despite the availability of results on plastid marker variability from 
a related group. For example, investigators working with other 
clades of Solanum should perform their own screening study in 
order to find the most variable set of plastid markers for their 
group of interest. Investigators working at species-level in specific 
clades within families for which general screening studies have 
been done (e.g., Asteraceae [34] and Poaceae [29,3,5]) should 
continue to explore serjuence variation at a lower taxonomic level 
and not draw conclusions only based on the overall patterns of 
sequence variation observed across these families. 

Variation within the Inverted Repeat 

Unexpectedly, results from our full plastome screening showed 
that two IR regions, tmA-tml and jcfl part 3, were amongst the 
most highly variable plastid markers in the Potato clade of Solanum, 
and in the case ofycfl part 3, also in the MoreUoid clade. In order 
to assure that the two segments of the IR are evolving in concert as 
expected, ser[uenc('d both IR copies of the trnA-tml ixAvon. Both 
copies were found to be identical, indicating that the IR structure 
is unchanged. What explains the elevated sequence variation 
within the two IR regions tmA-tml and ycfl remains largely a 
myster)', because the IR is generally extremely slowly evolving 
amongst angiosperms [37]. 

In the case of the large genejc/? that spans over the SSC and 
IR, studies have shown it to be amongst the most highly variable 
regions in several plant families [30,32,38-39]. The use oiycfl in 
molecular studies might be limited, however, due to evidence that 
it is evolving under strong positive selection in some groups [32]. 
No evidence was detected here that the ycfl is evolving under 
positive selection in Solanaceae based on K^/K, ratios, but its use 
in molecular phylogenetic studies should be viewed with caution. 

Plastid versus Nuclear Sequence Variation 

We identified ten new highly variable plastid markers based on 
the fuU plastome comparison of the three available genomes of the 



Potato clade of Solanum. These new plastid markers are now 
available for screening studies for molecular systematic projects on 
Solanum, and based on our preliminary tests, the new primers can 
also be used across Solanoideae and in some cases across the whole 
of Solanaceae. The new markers not only provide help in finding 
highly variable plastid markers needed for building robust, densely 
sampled species-level phylogenies, but can be used for finding 
potential barcodes for particular clades within the mega-diverse 
genus [40]. Ultimately, however, full plastome sequencing with 
next-generation sequ(-ncing technologies [32] will be the most 
cost-effective way of producing plastid serjuence data, but 
meanwhile, highly variable short sequence reads can be expected 
to remain in use at least in small scale studies, as well as in studies 
relying on highly fragmented starting materials such as museum 
DNA and scatsample studies [41-42]. 

Despite our best efforts to identify highly variable plastid regions 
in Solanum, the best performing markers in both the Potato and the 
MoreUoid clades were stiU marginally outperformed by the 
commonly used nuclear markers ITS and waxy in terms of the 
amount and percentage of sequence variation. The differences are 
small, however, especially so in the MoreUoid clade where the best 
performing plastid marker contained 80 variable sites compared to 
ITS and waxy, which had 83 and 82 variable sites, respectively. 
AvaUabUity of such highly variable plastid markers for molecular 
systematic studies wUl be valuable, as they wUl help in making data 
generation more cost effective. 

Particular markers merit special attention. One of the markers 
tested, Tpl32-tTnL, provided consistently high amounts of sequence 
variation across the two Solanum clades. This marker has been 
commonly used across angiosperms in both famUy- and species- 
level stucUes [15-16]. The marker has not yet been widely used in 
Solanaceae, but our results indicate that it should be included in 
initial screening studies. In fart, most plastid markers identified 
here show higher serjuencx variation compar(;d to tlu; markers that 
have been commonly used in molecular phylogenetic studies of 
Solanum (e.g., tmT-F and psbK-I). This indicates that considerable 
amount of time and money can be saved in molecular 
phylogenetic studies if initial screening studies are performed with 
additional markers. 

Implications for Barcoding Large Genera? 

Our results have potential implications for species-level barcod- 
ing studies, especially so for sttidies involving large clades such as 
Solanum. Previous studies have demonstrated failure of a single or a 
small set of barcodes to discriminate between closely related 
species e.g., [40,43-50]. Fazekas et al. [49] tested whether 
discrimination power could be increased by using a larger set of 
barcodes, but found that discriminating between closely related 
species faUed in c. 30% cases even when up to seven plastid 
markers were used. Results presented here could partially explain 
the generally low success rate for barcoding such groups: if 
variation in plastid marker sequence variation is high across 
orders, famUies and even between closely related groups (genera, 
and clades within genera), a larger set of plastid markers wUl have 
to be used especially so in species-rich groups such as Solanum, as 
only a subset of markers are likely to show adequate levels of 
sequence variation throughout individual clades. Previous studies 
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Table 2. Primer details. 



Region 


Primer 


Primer sequence 


Reference 


ntnR-rhrl 
UlfJO I ULL 


atpB F 


ArA ccc r;Ar f^Ar tat Arx Tf^ 






rbcL R 


cct^ hhr err LCd hrr ai^a a^; 

uurt f\f\\. f\\jn HLL Hurt rtvj 




Cipr pSuD 


cIpP F 


err. fAT f^TA fl^i^ TTr" TTA Ai^ 
Lrt 1 u I rt Luu 1 \\. L 1 rt rtu 






psbB R 


Tr~r~ TAA rrf^ a at i^at crT 

\ 1 rtrt rtrt 1 urt 1 1 urt 




ndhF 


ndhF2 F 


TTr" rrr aat ttt r~r;r" aat a 

1 IL rtrt 1 1 1 1 LuL Hrt 1 H 






nanrz_r\ 


Tcr ATT f"Tr Af"r tta rA(^ A(^a ta 

1 f\\.\ L 1 V_ rtLL 1 IH V-rtu Hurt Lrt 




nunr rpijz 


rpL32-R 


rrA ATA Trr rir vvr ttt rrh a 

L-^H rt 1 rt 1 L 1 1 111 III >.wLrt rt 


jnaw ei ai. lioj 




trnL'^'^^' 


CAA ACC tat KAT rrA Yf^lWl ATA TT 
\jrtrt rtuo 1 rt 1 r\rt 1 v-v-rt I oivi H 1 rt II 


JllaW UL a\. il<-)l 


ptiiL p^uJ 




TCC CAT TCA AAA ACC TCC TT 

1 \-.H 1 1 \Jr\ rtrtrt rtl^l^ 1 1^1^ 1 1 






rpL33 R 


AAT TTA err CCT TCA TCC TT 

rtrt 1 1 1 rt VJl^l^ 1 1 V-rt 1 VJl^ 1 1 




psbK-l 


psbK 


TTA err TTT eTT Tee caa e 

\ \ rt uLL 1 1 1 u 1 1 1 L-rtrt u 


Uli-illinz-iciAii-trtl-i at ral Tim 

noiiinysworLii ci ai. i\\J] 




psbl 


Al^A l^TT Tl^A eAe TAA eC A T 
HvjH u 1 1 1 uH urtu 1 rtrt u>.wrt 1 


Uli-illInriciAfni-tki at al Tim 

noiiiny sworcn ei ai. 


ml J "J tml 

rpiDZ irnL 


trnL"^'^^' 


CTe CTT rcT AAe Aer ac-c eT 


jnaw ei ai. lioj 




rpL32-F 


r Ae TTr rAA aaa aat i^ta ttt r 

\.t\\J \ \ \- LHH f\f\f\ HHL U 1 H 1 1 


zjnaw ei ai. lioj 


irni\ rps i o 


rpSl 6 


AAA i^Tr; eeT ttt tat eAT rr 

f\f\f\ \j \ \j \J\J \ III 1 H 1 uH 1 


jnaw ct ai. iioj 






TTA AAA err eAe TAr Tr~r Arr 

\ IH rtrtrt vjV-'— urtvj 1 rt^ ILI rtLL 


jFiaw ei ai. iioj 


iiiir\ iiiii \\nai 


H i IL^UMU^ r 


CTT TTr TTT Ter rer att Tr 

li, 1 1 1 1 III 1 rtl 1 1^ 








Trr TTT rTr rAr f^rT eAA eT 

1 1— III 1 \^ VJrtl— OVJ 1 VJrtrt VJ 1 






LI 1 lr\\UUl^J 1 r\ 


Arr Ari^ t^rT rrT rTr TTr Tr 

rtV-V- rtV— U VJV_ 1 1 1 1— 1 1 V— 1 v_ 






tt-nl tCtL\\\ 1 P 


Trr rAT TTr f^AT TTr rAr Tr 

ILL LH 1 1 IL vjrt 1 1 \ \. Vjrtu IL 




irnt irn/\ \\r\D) 


trnl^A 1 1 P 

irnuAU_r 


rAr ArT Trr ArA rrr rAr ta 

L.HV_ rtL 1 1 rtuH uHj LHu 1 H 






TrnijMl__ri 


Trr TTr rrr TrA TrT rrT Ar 

1 LL 1 1 o Vjuo 1 oH 1 L 1 Llj 1 rtu 




trnL-F 


tabC 


rrA AAT rrr TAr Arr rTA rr 

L-Urt HH 1 Luu 1 rtvj rtLu LIH Lu 


TaKarlat at al Fl /ll 

laDeriei ei ai. li^"] 




tabF 


ATT TrA ArT rrT rAr Arr Ar 

H 1 1 1 oH rtL 1 1 urtl- rtLu Hu 


Tal-iarlat at al [1/11 

laDsiisi ei ai. Li'fj 




trnC"^^"^** 


rAA Trr AAr rrr rAT rrT TAr 

uHH 1 HHL V_Hj LH 1 Lu 1 1 Hu 


jFiaw ei ai. lioj 






AAr Trr TAr AAr rrA tta rrA ATr 

rtrtL 1 LO 1 rtL rtrtL VJVJrt 1 Irt ULrt rtIL 


<iha\A/ at a! fl fil 


tfnT-L 


tab A 


rAT TAr AAA Trr rAT rrr rT 

Lrt 1 1 rtL rtrtrt 1 VjV^ Utt 1 VJL 1 LI 


idUciicL cL di. L 1 




tabD 


rrr rAT ArA rrr ArT TrA Ar 

uuLj Lirt 1 Hurt uuu rtL 1 1 urt rtL 


TaKarlat at al Fl /ll 

laoeriGi 6i ai. li^"] 


r>Ah(~ trrtU 

nunL. irnv 


ndhK F 


Arr rrA rAr ArA rAr rTA rr 

Huu LLH urtu HLH uHL L 1 H Lu 






atpE_R 


rrT TTr rrr ATr rAT aaa rT 

LL 1 1 IL uLL rtlu LrtI rtrtrt LI 






ndhK_l_F 


TTA CCT CGA CCT AGC GAA GC 






atpE_l_R 


GGA ATT GCC ATC TCA AGA TTT 




ycfl part 1 


Ycf1_F 


TCA AAG GCG CAA AAC ATT TA 






Ycf1_le_R 


GTT GTG TTT GGA CGT GTT GG 




ycfl part 2 


Ycf1_la_F 


CGA AAG CGA CCT TCA TTT TT 






Ycf1_R 


TCA GTC GAA GCA GGA GAC AA 




ycfl part 3 


trnA_F 


AAT AAC ACG GGG AAT CTA GAA AA 






trnA_R 


AAA TGT TTT GCG CCT TTG AG 





Details of all the primers used in this study. 
doi:1 0.1 371 /journal.pone.0082266.t002 



have identified that the failure of multiple barcodes in plants has 
not been due to lack of sequence variation provided by the chosen 
barcodes [51], but what has not been discussed is the phylogenetic 
or taxonomic depth at which observed parsimony informative 
characters occur. Poor discrimination power despite high sequence 
variation could indicate that the sequence variation retrieved with 
barcodes has not been at the appropriate phylogenetic level. We 
suspect this to be the case in many species rich groups based on 
our results presented here for Solanum. 



Conclusions 

Our study highlights the conclusions of Shaw et al. [16] that 
instead of focusing on finding the most highly variable plastid 
marker(s), phylogenetic studies should keep screening a set of 
potential markers to eliminate the many slowly evolving regions 
from the more highly variable ones. Results from our study reveal 
that this applies not only to family-level studies but to studies 
working between closely related groups such as genera or clades 
within genera. Based on our case study in the mega-diverse genus 
Solanum, there is great variation in the set of most highly variable 



PLOS ONE I www.plosone.org 



6 



November 2013 | Volume 8 | Issue 11 | e82266 



Predicting Plastid IVIarlcer Variation 



markers even between closely related groups. New primers for a 
set of highly variable plastid markers are now available that can be 
used to explore and screen plastid sequence variation in molecular 
phylogenetic studies of Solanum and Solanaceae in general. 

Methods 

Full Plastome Screening 

All three currently available full plastome genomes for Solanum, 
including S. tuberosum (DQ231562), S. bulbocastanum (DQ347958), 
and S. lycoperskum (DQ347959) [33] were used to screen for the 
most variable regions across the plastid genome. Plastomes were 
Eiligned using ClustalW as implemented in BioEdit v7.0.9 [52]. 
Only minor manual adjustments were necessary due to the fact 
that there are no differences in gene content or gene order 
between the three Solanum species [33]. Variation across the fuU 
genomes was visualised using mVista [53] using S. tuberosum 
genome as a reference serjuence. Sliding window analysis was run 
in the software DnaSP v5 [54] with a sliding window size 1,000 bp 
and TOO bp intervals in order to find the most variable c. 1,000 bp 
regions across the three genome alignment. Gaps were accounted 
for in calculating the window size. The top 20 regions with most 
variable sites were then furth(u- ranked based on the amount of 
variable characters, including indcls. Indels were recorded based 
on the gap coding method by Simmons & Ochoterena [56]. 
Because the DnaSP sliding window analysis does not account for 
indels in variable characters, we manually recorded the number of 
indels present within each of the regions. Indels were accounted 
for because they are commonly found in NCPs and present a 
valuable source of information for phylogenetic analyses. Primers 
were designed for all new regions using the Primer3 software 
online (http://primer3.ut.ee), whilst publishc-d prim(;rs were- used 
for regions already in use (Table 2). Primer design failed for petN- 
psbM and rpoCl-rpoB despite repeated efforts, and hence these 
regions were abandoned. The marker ndhC-tmV was chosen 
instead, because it showed high levels of both sequence divergence 
and high number of indels across the three full plastomes. Primer 
development and PGR optimisation for the other markers was 
easy, and good quality sequences were generated for the closely 
related Morelloid clade (12 species) to compare sequence variation 
across regions in the two closely allied Solanum clades. 

Traditional Screening 

The newly developed primers for the most variable plastid loci 
identified for the Potato clade were used to screen for the most 
variable regions in the closely related Morelloid clade of Solanum 
which belongs to the same major clade oi Solanum [55]. Screening 
was done using twelve of the total c. 65 Morelloid species 
representing all five morphologically delimited sections (File SI). 

In order to compare the variation found in the most variable 
regions within the Potato clade with the currently most commonly 
used markers in molecular systematic studies of Solanum, a further 
six markers were screened. These included the trnT-L and tmL-F 
intergenic spacers, which are amongst the most commonly used 
plastid markers in molecular systematic studies in Solanum [14], 
psbK-I, which has been proposed as a plant barcoding marker [10], 
and tniD-T, which is a commonly used marker in species-level 
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Sequence Variation 

We measured the number and proportion of variable sites 
within each region. The number of PI characters was only 
recorded for the Morelloid clade, due to the fact that PI characters 
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Previous studies have, however, shown that the proportion of PI 
and variable sites is positively correlated [29], and hence we are 
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for the Morelloid and the Potato clade can be found in File 82 and 
File S3, respectively. 
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